Goto

Collaborating Authors

 anticorrelated noise injection


Papers Simplified: »Anticorrelated Noise Injection for Improved Generalization«

#artificialintelligence

In this article, I will not explain to you all of the (exciting!) Instead, I will provide you with some implementations and pictures that should make it possible to understand the gist of the paper. I also gave my best to create an implementation of the optimizers mentioned in the paper, but use the code with care because I'm also not an expert in this regard. In order to understand what Anti-PGD (Anti-Perturbed Gradient Descent) is about, let us shortly recap how GD and the derived algorithms such as SGD and PGD work. Let us assume that we want to minimize a function f with a gradient denoted as f(θ).


Anticorrelated Noise Injection for Improved Generalization

arXiv.org Machine Learning

Injecting artificial noise into gradient descent (GD) is commonly employed to improve the performance of machine learning models. Usually, uncorrelated noise is used in such perturbed gradient descent (PGD) methods. It is, however, not known if this is optimal or whether other types of noise could provide better generalization performance. In this paper, we zoom in on the problem of correlating the perturbations of consecutive PGD steps. We consider a variety of objective functions for which we find that GD with anticorrelated perturbations ("Anti-PGD") generalizes significantly better than GD and standard (uncorrelated) PGD. To support these experimental findings, we also derive a theoretical analysis that demonstrates that Anti-PGD moves to wider minima, while GD and PGD remain stuck in suboptimal regions or even diverge. This new connection between anticorrelated noise and generalization opens the field to novel ways to exploit noise for training machine learning models.